16 research outputs found

    TTS evaluation campaign with a common spanish database

    Get PDF
    This paper describes the first TTS evaluation campaign designed for Spanish. Seven research institutions took part in the evaluation campaign and developed a voice from a common speech database provided by the organisation. Each participating team had a period of seven weeks to generate a voice. Next, a set of sentences were released and each team had to synthesise them within a week period. Finally, some of the synthesised test audio files were subjectively evaluated via an online test according to the following criteria: similarity to the original voice, naturalness and intelligibility. Box-plots, Wilcoxon tests and WER have been generated in order to analyse the results. Two main conclusions can be drawn: On the one hand, there is considerable margin for improvement to reach the quality level of the natural voice. On the other hand, two systems get significantly better results than the rest: one is based on statistical parametric synthesis and the other one is a concatenative system that makes use of a sinusoidal model to modify both prosody and smooth spectral joints. Therefore, it seems that some kind of spectral control is needed when building voices with a medium size database for unrestricted domains.Postprint (published version

    The strategic impact of META-NET on the regional, national and international level

    Get PDF
    This article provides an overview of the dissemination work carried out in META-NET from 2010 until 2015; we describe its impact on the regional, national and international level, mainly with regard to politics and the funding situation for LT topics. The article documents the initiative's work throughout Europe in order to boost progress and innovation in our field.Peer ReviewedPostprint (author's final draft

    Searching for an Optimal Reference System for On-line Signature Verification based on (x, y) Alignment

    No full text
    The handwritten signature is the expression of the will and consent in daily operations such as banking transactions, access control, contracts, etc. However, since signing is a behavioral feature it is not invariant; we do not always sign at the same speed, in the same position or at the same orientation. In order to reduce the errors caused in verification by these differences between original signatures we have introduced a new concept of reference system for the (x, y) coordinates from experimental results. The basis of this concept lies on using global references (centre of mass and principal axes of inertia) instead of local references (initial point and initial angle) for a recognition system based on local parameters. The system is based on the hypothesis that signing is a feedback process, in which humans react to our own signature while writing it following patterns stored in our brain

    TTS evaluation campaign with a common spanish database

    No full text
    This paper describes the first TTS evaluation campaign designed for Spanish. Seven research institutions took part in the evaluation campaign and developed a voice from a common speech database provided by the organisation. Each participating team had a period of seven weeks to generate a voice. Next, a set of sentences were released and each team had to synthesise them within a week period. Finally, some of the synthesised test audio files were subjectively evaluated via an online test according to the following criteria: similarity to the original voice, naturalness and intelligibility. Box-plots, Wilcoxon tests and WER have been generated in order to analyse the results. Two main conclusions can be drawn: On the one hand, there is considerable margin for improvement to reach the quality level of the natural voice. On the other hand, two systems get significantly better results than the rest: one is based on statistical parametric synthesis and the other one is a concatenative system that makes use of a sinusoidal model to modify both prosody and smooth spectral joints. Therefore, it seems that some kind of spectral control is needed when building voices with a medium size database for unrestricted domains

    TTS evaluation campaign with a common spanish database

    No full text
    This paper describes the first TTS evaluation campaign designed for Spanish. Seven research institutions took part in the evaluation campaign and developed a voice from a common speech database provided by the organisation. Each participating team had a period of seven weeks to generate a voice. Next, a set of sentences were released and each team had to synthesise them within a week period. Finally, some of the synthesised test audio files were subjectively evaluated via an online test according to the following criteria: similarity to the original voice, naturalness and intelligibility. Box-plots, Wilcoxon tests and WER have been generated in order to analyse the results. Two main conclusions can be drawn: On the one hand, there is considerable margin for improvement to reach the quality level of the natural voice. On the other hand, two systems get significantly better results than the rest: one is based on statistical parametric synthesis and the other one is a concatenative system that makes use of a sinusoidal model to modify both prosody and smooth spectral joints. Therefore, it seems that some kind of spectral control is needed when building voices with a medium size database for unrestricted domains

    Another Step in the Modeling of Basque Intonation: Bermeo

    No full text
    In this paper the basic features of the intonational structure of Bermeo Basque (BB) are analyzed. In BB there exists a lexical distinction between accented and unaccented words. Accented words are always stressed, and unaccented words (not containing any accented morphemes) only receive stress on their final syllable when they are immediately preceding the verb; otherwise they surface stressless. BB shows a hierarchically organized intonational structure. Accentual Phrases are identified by an initial %L boundary tone, a phrasal H- tone associated to the second syllable, and a nuclear H* pitch accent. H- spreads rightwards to other syllables until a H* pitch accent is met. A H* pitch accent triggers downstep on the following H*. An Intermediate Phrase contains one or more APs, and is the domain of downstep. Finally, an Intonational Phrase is signalled by a final L% in declarative
    corecore